The main purpose of this hands on activity is to familiarize the participant with the different packages available to download, preprocess and visualize pharmacogenomics data. In particular, we will be focusing on two R packages, PharmacoGx and GRmetrics, and we will reference the GDSCTools python package for further exploration on your own, if you prefer to work in python. The following code chunk is not evaluated, but included in case you wish to run this on your own machine: it would install all the packages used in this markdown document.

## Load Bioconductor Installer
install.packages("BiocManager")
library(BiocManager)
## install function provided by BioC installs across Bioconductor and CRAN
install(c("PharmacoGx", "Biobase", "GRmetrics", "ggplot2", "reshape2"))

PharmacoGx

PharmacoGx was developed to serve as a platform for integrating analysis across different datasets. The main focus of the r BiocStyle::Biocpkg("PharmacoGx") package is to provide access to pharmacogenomics datasets that have been preprocessed with a uniform pipeline, and extensively curated to ensure maximum overlap and consistency. For this purpose, PharmacoGx introduces new data structure called a PharmacoSet, or PSet for short, for storing pharmacogenomics studies, based on a level of abstraction from the raw experimental data, and allows bioinformaticians and biologists to work with data at the level of genes, drugs and cell lines. This provides a more intuitive interface and, in combination with unified curation, simplifies analyses between multiple datasets. The purpose of this section is to familiarize the reader with the data structures and utilites provided by PharamcoGx, to most quickly access pharmacogenomics data of interest.

We start by installing the PharmacoGx package, and then introduce the PSet object structure, key functions to access data from downloaded datasets, and some of the functionality implemented in the package for extracting and summarizing pharmacogenomics data.

PharmacoGx is available from the Bioconductor project. To install PharmacoGx, we use the r BiocStyle::CRANpkg("BiocManager") library:

library(BiocManager)
install("Biobase")
install("PharmacoGx") 

Load PharmacoGx into your current workspace:

suppressPackageStartupMessages({
  library(PharmacoGx, verbose=FALSE)
  library(Biobase, verbose=FALSE)
})

Overview of the PharmacoSet

PharmacoGx was made to unify the preprocessing and annotation between public Pharmacogenomics studies, so that the analyst can spend less time in data sanitization, and skip straight to the analysis. As such, PharmacoSets for major datasets are available for download from our servers. These PSets all went through a unified Preprocessing, QC, Normalization and Annotation Pipeline.

Downloading PharmacoSet objects

Let us try to download a PSet onto our local working environment. A table of available PharmacoSet objects can be obtained by using the availablePSets function. Any of the PharmacoSets in the table can then be downloaded by calling downloadPSet, which saves the datasets into a directory of the users choice, and returns the data into the R session. Lets try downloading the GDSC and CCLE PSets.

  availablePSets(saveDir=file.path(".", "PSets"))
  GDSC <- downloadPSet("GDSC", saveDir=file.path(".", "PSets"))
  CCLE <- downloadPSet("CCLE", saveDir=file.path(".", "PSets"))

The basic structure of a PharmacoSet is as follows:

Working with PSets

For most of your work with PSet objects, we recommend you use the accessor functions implemented in the package to access the data stored in the object. For example, here we can use the cellInfo function to pull out the tissue of origin for each cell line in the CCLE dataset. The tissueid column is standard for PSets, and should be present in all objects.

mycol <- c("#8dd3c7","#ffffb3","#bebada","#fb8072","#80b1d3","#fdb462",
           "#b3de69","#fccde5","#d9d9d9","#bc80bd","#ccebc5","#ffed6f",
           "#a6cee3","#1f78b4","#b2df8a","#33a02c","#fb9a99","#e31a1c",
           "#fdbf6f","#ff7f00","#cab2d6","#6a3d9a","#ffff99","#b15928")
pie(table(CCLE@cell[,"tissueid"]), 
    col=mycol, 
    main="Tissue types", 
    radius=1, 
    cex=0.8)
Tissue of origin of cell lines in CCLE study

Tissue of origin of cell lines in CCLE study

While we will introduce some of the functions for working with PSets in this tutorial, the documentation of the PharmacoSet object contains descriptions of the available functions to interface with PSet objects:

?`PharmacoSet-class`

PSets can be subsetted by refering directly to the drugs and cell lines you want to keep in the dataset. For example, we can subset the GDSC PSet in the following ways:

print(GDSC)
## Name:  GDSC 
## Date Created:  Wed Dec 30 10:44:21 2015 
## Number of cell lines:  1124 
## Number of drug compounds:  139 
## RNA: 
##  Dim:  11833 789 
## CNV: 
##  Dim:  24960 936 
## Drug pertubation: 
##  Please look at pertNumber(pSet) to determine number of experiments for each drug-cell combination.
## Drug sensitivity: 
##  Number of Experiments:  79903 
##  Please look at sensNumber(pSet) to determine number of experiments for each drug-cell combination.
print(drugNames(GDSC)[1])
## [1] "Erlotinib"
print(subsetTo(GDSC, drugs="Erlotinib"))
## Name:  GDSC 
## Date Created:  Wed Dec 30 10:44:21 2015 
## Number of cell lines:  1124 
## Number of drug compounds:  1 
## RNA: 
##  Dim:  11833 789 
## CNV: 
##  Dim:  24960 936 
## Drug pertubation: 
##  Please look at pertNumber(pSet) to determine number of experiments for each drug-cell combination.
## Drug sensitivity: 
##  Number of Experiments:  323 
##  Please look at sensNumber(pSet) to determine number of experiments for each drug-cell combination.
print(GDSC["YT","Erlotinib"])
## Name:  GDSC 
## Date Created:  Wed Dec 30 10:44:21 2015 
## Number of cell lines:  1 
## Number of drug compounds:  1 
## RNA: 
##  Dim:  11833 0 
## CNV: 
##  Dim:  24960 1 
## Drug pertubation: 
##  Please look at pertNumber(pSet) to determine number of experiments for each drug-cell combination.
## Drug sensitivity: 
##  Number of Experiments:  1 
##  Please look at sensNumber(pSet) to determine number of experiments for each drug-cell combination.

As an exercise, lets manually compute the number of overlaping cell lines and drugs between CCLE and GDSC. Try it yourself!

If you noticed as you completed the previous exercise, cell line and drug names are standardized between PSets downloaded using the downloadPSet function.

PharmacoGx also implements a convenience function to intersect PSets:

common <- intersectPSet(list(CCLE, GDSC), intersectOn=c("drugs", "cell.lines"))
print(common)

PharmacoGx Functionality

In addition to defining the PSet data structure, PharmacoGx implements some basic preprocessing, plotting and analysis functions, designed to help the user create objects for use in biomarker discovery or machine learning tasks.

Fitting Drug Dose Response Curves

One of the core tasks implemented in PharmacoGx is the fitting of Hill Curve Models to dose-response data. In PharmacoGx, we use the 3 Parameter Hill Slope function as our model of drug response in cancer cell lines: To fit a Hill Slope model to you data, you can use the logLogisticRegression function, as below:

concentrations <- 1/2^seq(0,8) * 1
viabilities <-  c(0, 33.3, 60, 77.8, 88.2, 93.9, 96.9, 98.4, 99.2)

pars <- logLogisticRegression(conc = concentrations, viability = viabilities)
## Warning in sanitizeInput(conc = conc, viability = viability, conc_as_log
## = conc_as_log, : Concentration Values were unsorted. Sorting concentration
## and ordering viability in same order
print(pars)
## $HS
## [1] 1.900066
## 
## $E_inf
## [1] 0
## 
## $EC50
## [1] 0.2813364

Plotting Drug Dose Response Data

Drug-Dose response data included in the PharmacoSet objects can be conviniently plotted using the drugDoseResponseCurve function. Given a list of PharmacoSets, a drug name and a cell name, it will plot the drug dose response curves for the given cell-drug combination in each dataset, allowing direct comparisons of data between datasets.

cells <- c("OCI-AML2","A253","NCI-H1648")
par(mfrow=c(1, 3), pty = "s")
drugDoseResponseCurve(drug="lapatinib", cellline=cells[1], 
                      pSets=list(CCLE, GDSC), plot.type="Fitted", 
                      legends.label="auc_published")
drugDoseResponseCurve(drug="lapatinib", cellline=cells[2], 
                      pSets=list(CCLE, GDSC), plot.type="Fitted", 
                      legends.label="auc_published")
drugDoseResponseCurve(drug="lapatinib", cellline=cells[3], 
                      pSets=list(CCLE, GDSC), plot.type="Fitted", 
                      legends.label="auc_published")

The function drugDoseResponseCurve can also be used to plot your own drug dose response curves, as follows:

concentrations <- 1/2^seq(0,8) * 1
viabilities <-  c(0, 33.3, 60, 77.8, 88.2, 93.9, 96.9, 98.4, 99.2)

drugDoseResponseCurve(concentrations = list("Exp 1" = concentrations), viabilities = list("Exp 1" = viabilities))
## Warning in sanitizeInput(concentrations[[i]], viabilities[[i]], conc_as_log
## = conc_as_log, : Concentration Values were unsorted. Sorting concentration
## and ordering viability in same order

Computing Summary Measures for DDRCs

Often with in vitro Pharmacogenomics data, we want to compare the drug sensitivity of a cell line to some omic feature. For this, we want to summarize the drug dose response curve into a single number representing the sensitivity of the cell line. The IC50 and Area Above the curve are two convenient metrics for quuantifying the observed drug sensitivity. If you noticed above, the drugDoseResponseCurve function computes them by default. In PharmacoGx, they can be computed manually as follows:

concentrations <- rev(1/2^seq(0,8) * 1)
viabilities <-  c(99.2, 98.4, 96.9, 93.9, 88.2, 77.8, 60, 33.3, 0)

print(computeAUC(concentration = concentrations, viability = viabilities))
## [1] 23.68416
print(computeIC50(concentration = concentrations, viability = viabilities))
## [1] 0.2813364

In PharmacoGx, we call these measures sensitivity measures. PSets come with these measures precomputed, and accessible using the sensitivityProfiles function:

head(sensitivityProfiles(CCLE))

Summary Functions

Pharmacogenomics studies often contain many examples of either replicated or missing data. One of the most common tasks in preparing data for statistical or machine learning analysis is aligning your features and labels. To accelerate using Pharmacogenomics data for analysis, PharmacoGx contains two functions which create deduplicated matrices with missing data filled by NAs: summarizeSensitivityProfiles and summarizeMolecularProfiles. They create matrices which are drugs x cell lines and molecular features x cell lines, with any cell lines profiled for only sensitivity or molecular features padded with NA values. For the molecular profiles, the data is returned in a r BiocStyle::Biocpkg("Biobase") ExpressionSet object, while drug sensitivity data is returned as a matrix.

dim(PharmacoGx::summarizeMolecularProfiles(CCLE, "rna"))
## Features  Samples 
##    20049     1061
dim(PharmacoGx::summarizeSensitivityProfiles(CCLE))
## [1]   24 1061

Below, we use the summarizeMolecularProfiles function and ggplot to investigate the distributions of AUC values within CCLE.

library(ggplot2, verbose=FALSE)
library(reshape2, verbose=FALSE)
CCLE.auc <- PharmacoGx::summarizeSensitivityProfiles(CCLE, sensitivity.measure = "auc_recomputed")
melted_data <- melt(CCLE.auc)
NA_rows <- unique(which(is.na(melted_data), arr.ind=T)[,1])
melted_data <- melted_data[-NA_rows,]
p <- ggplot(melted_data, aes(x=Var1,y=value)) +
  geom_boxplot(fill="gray") +
  theme(axis.text.x=element_text(angle=90,hjust=1)) +
  xlab("Drugs") +
  ylab("AAC")
print(p)
Cells response to drugs in CCLE

Cells response to drugs in CCLE

Replication

In this section we will investigate the consistency between the GDSC and CCLE datasets. In both CCLE and GDSC, the transcriptome of cells was profiled using an Affymatrix microarray chip. Cells were also tested for their response to increasing concentrations of various compounds, and form this the IC50 and AUC were computed. However, the cell and drugs names used between the two datasets were not consistent. Furthermore, two different microarray platforms were used. However, PharmacoGx allows us to overcome these differences to do a comparative study between these two datasets. GDSC was profiled using the hgu133a platform, while CCLE was profiled with the expanded hgu133plus2 platform. While in this case the hgu133a is almost a strict subset of hgu133plus2 platform, the expression information in PharmacoSet objects is summarized by Ensemble Gene Ids, allowing datasets with different platforms to be directly compared. The probe to gene mapping is done using the BrainArray customCDF for each platform. To begin, you would load the datasets from disk or download them using the downloadPSet function above. We want to investigate the consistency of the data between the two datasets. The common intersection between the datasets can then be found using intersectPSet. We create a summary of the gene expression and drug sensitivity measures for both datasets, so we are left with one gene expression profile and one sensitivity profile per cell line within each dataset. We can then compare the gene expression and sensitivity measures between the datasets using a standard correlation coefficient.

Consistency of mRNA expression

First we look at the consistency of gene expression profiles across the two datasets. We use the same intersected PSets as above, and calculate Pearson correlations between the expression of common cell lines.

common <- intersectPSet(pSets = list("CCLE"=CCLE, "GDSC"=GDSC), 
                        intersectOn = c("cell.lines", "drugs"), 
                        strictIntersect = TRUE)
## Intersecting large PSets may take a long time ...
common.features <- intersect(rownames(featureInfo(common$GDSC, "rna")),
                             rownames(featureInfo(common$CCLE, "rna")))

GDSC.rna <- PharmacoGx::summarizeMolecularProfiles(common$GDSC, "rna", 
                                       features = common.features)
## Summarizing rna molecular data for:  GDSC
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=                                                                |   2%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |====                                                             |   7%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |=========                                                        |  14%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |============                                                     |  18%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |===============                                                  |  23%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |==================                                               |  27%
  |                                                                       
  |===================                                              |  30%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |======================                                           |  34%
  |                                                                       
  |========================                                         |  36%
  |                                                                       
  |=========================                                        |  39%
  |                                                                       
  |===========================                                      |  41%
  |                                                                       
  |============================                                     |  43%
  |                                                                       
  |==============================                                   |  45%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |===================================                              |  55%
  |                                                                       
  |=====================================                            |  57%
  |                                                                       
  |======================================                           |  59%
  |                                                                       
  |========================================                         |  61%
  |                                                                       
  |=========================================                        |  64%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |===============================================                  |  73%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |==================================================               |  77%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=====================================================            |  82%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |========================================================         |  86%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |===========================================================      |  91%
  |                                                                       
  |=============================================================    |  93%
  |                                                                       
  |==============================================================   |  95%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |=================================================================| 100%
CCLE.rna <- PharmacoGx::summarizeMolecularProfiles(common$CCLE, "rna",
                                       features = common.features)

cors.gexp <- cor(Biobase::exprs(GDSC.rna), Biobase::exprs(CCLE.rna), use="pairwise.complete")
hist(diag(cors.gexp), main="Correlations of gene expression between same cell lines", xlim=c(-1,1))

boxplot(list(Same=diag(cors.gexp), Different = cors.gexp[upper.tri(cors.gexp)]), col=rainbow(5), main="Comparison of gene expression profiles")

Consistency of pharmacological profiles

Before running the correlation analysis, let us examine some examples of drug sensitivity curves which are discordant and concordant across the two datasets, using the functions described above.

common <- intersectPSet(pSets = list("CCLE"=CCLE, "GDSC"=GDSC), 
                        intersectOn = c("cell.lines", "drugs"), 
                        strictIntersect = TRUE)
## Intersecting large PSets may take a long time ...
drugs <- drugNames(common$CCLE)
##Example of concordant and discordant drug curves
cases <- rbind(
  c("CAL-85-1", "17-AAG"),
  c("HT-29", "PLX4720"),
  c("COLO-320-HSR", "AZD6244"),
  c("HT-1080", "PD-0332991"))
par(mfrow=c(2, 2))
for (i in seq_len(nrow(cases))) {
  drugDoseResponseCurve(pSets=common, 
                        drug=cases[i,2], 
                        cellline=cases[i,1], 
                        legends.label="ic50_published", 
                        plot.type="Fitted", 
                        ylim=c(0,130))
}
Consistency of drug response curves across studies

Consistency of drug response curves across studies

As we can see, there is a range of concordance of drug response between these two datasets. To get a more global picture, we look at the correlations of cell line response between datasets.

GDSC.auc <- PharmacoGx::summarizeSensitivityProfiles(common$GDSC)
CCLE.auc <- PharmacoGx::summarizeSensitivityProfiles(common$CCLE)

corssens <- cor(GDSC.auc, CCLE.auc, use="pairwise.complete")
hist(diag(corssens), main="Correlations of drug sensitivity between same cell lines", breaks=50)

boxplot(list(Same=diag(corssens), Different = corssens[upper.tri(corssens)]), col=rainbow(5), main="Comparison of drug sensitivity profiles")

As we can see, the drug sensitivity data is less concordant between datasets than the gene expression. This presents a challenge for statistical and machine learning, as usually we use the drug response as our target/labels for our algorithms, assuming that they represent a ground truth. As the workshop goes on, we will introduce some suggested techniques to overcome this challenge.

The first method introduces will be the Growth Rate correction techniques implemented in the GRmetrics package.

GRmetrics

The GRmetrics implements the approach published by Hafner et al., to correct for variability in drug sensitivity metrics caused by variations in growth rate of the untreated cells. In this section, we will motivate the need for correcting for growth rate, introduce the method from Hafner et al., and show an example of how the GRmetrics correct for growth rate discrepancies, using simulated data.

Need for GR Correction

To demonstrate the need for Growth Rate correction, we will first note that it is not trivial that differences in growth rate will lead to variability in our drug sensitivity summary metrics.

When we compute the viability for every drug concentration, we are normalizing by the viability of an untreated control, which has been growing beside the treated cells during the duration of the experiment.

A typical method for compute vialibility values would be as follows:

In this case, there is already some implicit normalization to the growth rate of the cell line, as the control well viability measurement will depend on the growth rate.

We will now show that this is not enough, using some simulated drug response data available in the GRmetrics R package. First, let us load the package.

library(GRmetrics)
## Loading required package: SummarizedExperiment
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
## 
##     expand.grid
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: DelayedArray
## Loading required package: matrixStats
## 
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
## 
##     anyMissing, rowMedians
## Loading required package: BiocParallel
## 
## Attaching package: 'DelayedArray'
## The following objects are masked from 'package:matrixStats':
## 
##     colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
## The following objects are masked from 'package:base':
## 
##     aperm, apply, rowsum

We use a dataset called InputCaseA. In this dataset, all the data from across experiments is assembled into an R tibble. The vignette vignette(“GRmetrics”) ahs more information about the input cases available in GRmetrics.

data("inputCaseA")
head(inputCaseA)

First, lets calculate the doubling times for each experiment. In this simulated dataset, the perturbation variable denotes a simulated perturbation which modifies the growth rate of the cell lines - for example, a change in growth media. We calculate summary statistics (over replicates and different drugs) for all the doubling times for the cell lines:

dbl.times <- by(inputCaseA, 
                list(inputCaseA$cell_line, inputCaseA$perturbation), 
                function(x){
                  return(x$time/log2(x$cell_count__ctrl / x$cell_count__time0))
                } )

## Creating names for the boxplot:
names(dbl.times) <- outer(rownames(dbl.times), paste(",\n Pert:",colnames(dbl.times)), paste0)

boxplot(dbl.times, las=3, ylab="Doubling Time in Hours")

As we can see, the perturbation effect for BT20 is the strongest in this data. Lets examine what the effect of this perturbation is on common summary metrics such as the IC50 and AUC. For sake of simplicity, we examine only drugA, at the 72 hour timepoint:

pert0 <- as.data.frame(subset(inputCaseA, time == 72 &perturbation == 0 & agent=="drugA" & cell_line=="BT20"))
pert1 <- as.data.frame(subset(inputCaseA, time == 72 &perturbation == 1 & agent=="drugA" & cell_line=="BT20"))

## We first calculate the viability relative to control for each measurement. Since 
## we don't have background measurements, we omit that.
pert0$viability <- pert0$cell_count/pert0$cell_count__ctrl
pert1$viability <- pert1$cell_count/pert1$cell_count__ctrl

drugDoseResponseCurve(concentrations = list("Pert 0" = pert0$concentration,
                                            "Pert 1" = pert1$concentration), 
                      viabilities = list("Pert 0" = pert0$viability,
                                         "Pert 1" =  pert1$viability), 
                      conc_as_log = FALSE, 
                      viability_as_pct = FALSE, 
                      plot.type="Fitted")
## Warning in sanitizeInput(concentrations[[i]], viabilities[[i]], conc_as_log
## = conc_as_log, : Warning: Viability data exceeds negative control.
## Warning in sanitizeInput(concentrations[[i]], viabilities[[i]], conc_as_log
## = conc_as_log, : Concentration Values were unsorted. Sorting concentration
## and ordering viability in same order
## Warning in sanitizeInput(concentrations[[i]], viabilities[[i]], conc_as_log
## = conc_as_log, : Warning: Viability data exceeds negative control.
## Warning in sanitizeInput(concentrations[[i]], viabilities[[i]], conc_as_log
## = conc_as_log, : Concentration Values were unsorted. Sorting concentration
## and ordering viability in same order

As we can see, the difference in doubling times affects both the AAC and the IC50 values. Observing this effect, Hafner et al. introduced a method to correct for such variability.

The GRmetrics Theory

The idea behind the GR metrics is instead of calculating relative cell viabilities, to calculate the relative effect of the drug treatment on the growth rate, and then proceed with the analysis as usual. The basic formula is as follows:

\[GR(c,t) = 2^{\frac{k(c,t)}{k(0)}} - 1\]

Here, \(k(c,t)\) is the growth rate of the cell line treated with the drug at concentration \(c\) and measured at time \(t\). \(k(0)\) is the growth rate of the cell line for the untreated control. In practice, GR values can be estimated using the starting number of cells seeded for each experiment, and the final number of cells at the end of the treatment when viability is measured. If \(x\) is a measure of cell number (either cell counting, or assays such as Cell Titer Glo which are proportional to cell number), then GR values can be calculated as:

\[GR(c) = 2^{\frac{\log_2(x(c)/x_0)}{\log_2(x_{ctrl}/x_0)}} - 1\]

It is important to note that unlike relative viability, GR values lie on a range of [-1,1]. Values \(GR>0\) indicate that the drug is having a cytostatic effect, while values \(GR<0\) indicate a cytotoxic effect. ## Using the GRmetrics package to correct for growth rate discrepancies Let us now see whether this new approach to assessing cell viability can correct for differences in cell division time, and at the same time introduce some of the functionality in the GRmetrics package.

The major function for performing GR metric inference is the GRfit function. This function takes as an argument a dataset, formatted similarly to the example dataset we are using in this section. It then performs the calculation of the GR values, fits a sigmoidal curve to the each experiment, and computes GR metrics (as well as their unmodified counterparts). Lets do so on our example data for the experiments we were examining above:

## groupingVariables determines which columns of the input define a unique
## experiment. Here, we will combine both data frames and have the function
## treat each perturbation as a separate experiment, using 3 determining values,
## the cell_line and agent and perturbation status
GR.fitted <- GRfit(rbind(pert0, pert1), groupingVariables=c("cell_line", "agent", "perturbation"))
## Warning in log(dose): NaNs produced

## Warning in log(dose): NaNs produced

## Warning in log(dose): NaNs produced

## Warning in log(dose): NaNs produced
## We can use functions in GRmetrics directly on the GR.fitted object, to plot 
## the drug dose response curves.
p <- GRdrawDRC(GR.fitted, experiments = "all", plotly = FALSE)
print(p)

## GR.fitted is a SummarizedExperiment object, therefore to get at the actual values, we will use the assay command:

head(assay(GR.fitted))
##                     BT20 drugA 0 BT20 drugA 1
## ctrl_cell_doublings   1.81616644  1.190694090
## GR50                  0.10393908  0.097355116
## GRmax                 0.01715550  0.007704469
## GR_AOC                0.48286829  0.504099009
## GEC50                 0.09904125  0.096308701
## GRinf                 0.02560534  0.006397038

As we can see above, the GR values are stable to the effect of different doubling times of the cells lines in different conditions. We can examine this effect on the whole of the example dataset:

## We run all the experiments in GRfit
suppressWarnings({
  GR.fit.all <- GRfit(inputCaseA, c("cell_line", "agent", "perturbation"))
}) 
# Above warns many times about curve fitting parameters and NaN's produced by logarithms of 0 dose

GRbox(GR.fit.all, "AUC", groupVariable = c("perturbation"),plotly=FALSE, pointColor = "agent")

GRbox(GR.fit.all, "GR_AOC", groupVariable = c("perturbation"),plotly=FALSE, pointColor = "agent")

Overall, we see that the GR_AOC values are much more stable between the two conditions compared to the AUC values. We encourage you to explore the effect on the other metrics as well, and if you want to learn more about the GRmethod, the Nature Methods paper is a great starting point.

Bonus: GDSCTools

Unfortunately, we don’t have time to cover the GDSCTools package in depth. However, we want to bring the reader’s attention to the package, which provides much of the functionality described above within the Python programming language.

Specifically, GDSCTools provides the ability to download the GDSC data as well as input their own, use ANOVA modelling to find novel univariate biomakers of drug response from the preclinical data, do basic multivariate modelling using ElasticNet regression, and visualize all these results.

For interested readers, we encourage them to review the documentation at: https://gdsctools.readthedocs.io/en/master/index.html

References

In this section, we include references to the Tools and Datasets covered above.

PharmacoGx:

Smirnov, P. et al. PharmacoGx: An R package for analysis of large pharmacogenomic datasets. Bioinformatics 32, 1244–1246 (2016).

CCLE Dataset:

Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).

GDSC Dataset (Called CGP at the time of publication):

Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012).

BrainArray CDF Files:

Sabatti, C., Karsten, S. L. & Geschwind, D. H. Thresholding rules for recovering a sparse signal from microarray experiments. Math Biosci 176, 17–34 (2002).

GR Corrected Metrics:

Hafner, M., Niepel, M. & Sorger, P. K. Alternative drug sensitivity metrics improve preclinical cancer pharmacogenomics. Nature Biotechnology (2017). doi:10.1038/nbt.3882

Hafner, M., Niepel, M., Chung, M. & Sorger, P. K. Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs. Nat Meth advance online publication, (2016).

Hafner, M., Niepel, M., Chung, M. & Sorger, P. K. Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs. Nature Methods 13, 521–527 (2016).

GDSCTools: Cokelaer, T. et al. GDSCTools for mining pharmacogenomic interactions in cancer. Bioinformatics 34, 1226–1228 (2018).